Using Crowdsourcing to Improve Profanity Detection
نویسندگان
چکیده
Profanity detection is often thought to be an easy task. However, past work has shown that current, list-based systems are performing poorly. They fail to adapt to evolving profane slang, identify profane terms that have been disguised or only partially censored (e.g., @ss, f$#%) or intentionally or unintentionally misspelled (e.g., biatch, shiiiit). For these reasons, they are easy to circumvent and have very poor recall. Secondly, they are a one-size fits all solution – making assumptions that the definition, use and perceptions of profane or inappropriate holds across all contexts. In this article, we present work that attempts to move beyond list-based profanity detection systems by identifying the context in which profanity occurs. The proposed system uses a set of comments from a social news site labeled by Amazon Mechanical Turk workers for the presence of profanity. This system far surpasses the performance of listbased profanity detection techniques. The use of crowdsourcing in this task suggests an opportunity to build profanity detection systems tailored to sites and communities.
منابع مشابه
Rephrasing Profanity in Chinese Text
This paper proposes a system that can detect and rephrase profanity in Chinese text. Rather than just masking detected profanity, we want to revise the input sentence by using inoffensive words while keeping their original meanings. 29 of such rephrasing rules were invented after observing sentences on real-word social websites. The overall accuracy of the proposed system is 85.56%
متن کاملGood Clean Fun? A Content Analysis of Profanity in Video Games and Its Prevalence across Game Systems and Ratings
Although violent video game content and its effects have been examined extensively by empirical research, verbal aggression in the form of profanity has received less attention. Building on preliminary findings from previous studies, an extensive content analysis of profanity in video games was conducted using a sample of the 150 top-selling video games across all popular game platforms (includ...
متن کاملProfanity in media associated with attitudes and behavior regarding profanity use and aggression.
OBJECTIVE We hypothesized that exposure to profanity in media would be directly related to beliefs and behavior regarding profanity and indirectly to aggressive behavior. METHODS We examined these associations among 223 adolescents attending a large Midwestern middle school. Participants completed a number of questionnaires examining their exposure to media, attitudes and behavior regarding p...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملAnalyzing Labeled Cyberbullying Incidents on the Instagram Social Network
Cyberbullying is a growing problem affecting more than half of all American teens. The main goal of this paper is to study labeled cyberbullying incidents in the Instagram social network. In this work, we have collected a sample data set consisting of Instagram images and their associated comments. We then designed a labeling study and employed human contributors at the crowd-sourced CrowdFlowe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012